我们调查攻击者的效果如何,当它只从受害者的行为中学习时,没有受害者的奖励。在这项工作中,当受害者的动机未知时,我们被攻击者想要行事的情景。我们认为一个启发式方法可以使用攻击者是最大化受害者政策的熵。政策通常不会被滥用,这意味着它可以通过被动地观察受害者来提取。我们以奖励无源勘探算法的形式提供这样的策略,可以在勘探阶段最大化攻击者的熵,然后在规划阶段最大化受害者的经验熵。在我们的实验中,受害者代理商通过政策熵最大化而颠覆,暗示攻击者可能无法访问受害者的奖励成功。因此,仅基于观察行为的无奖励攻击表明,即使受害者的奖励信息受到保护,攻击者的可行性也在不了解受害者的动机。
translated by 谷歌翻译
转移学习(TL)利用以前获得的知识有效地学习新任务,并且已被用于培训具有有限数量的数据的深度学习(DL)模型。当TL应用于DL时,佩带的预押(教师)模型是微调的,以构建特定域(学生)模型。这种微调依赖于DL模型可以分解到分类器和特征提取器,并且一系列研究表明,相同的特征提取器可用于培训多个任务上的分类器。此外,最近的研究提出了多种算法,可以进行微调教师模型的特征提取器,以更有效地培训学生模型。我们注意到,无论特征提取器的微调如何,学生模型的分类器都接受了特征提取器的最终输出(即倒数第二层的输出)。然而,最近的一项研究表明,跨层中的Resnet中的特征映射可能是在功能上等同的,提高要素提取器内的特征映射的可能性也可用于训练学生模型的分类器。灵感来自这项研究,我们测试了教师模型隐藏层中的特征映射,可用于提高学生模型的准确性(即,TL的效率)。具体而言,我们开发了“自适应传输学习(ATL)”,可以选择用于TL的最佳特征映射,并在几次拍摄的学习设置中测试。我们的实证评估表明,ATL可以帮助DL模型更有效地学习,特别是当可用示例有限时。
translated by 谷歌翻译
Feature selection helps reduce data acquisition costs in ML, but the standard approach is to train models with static feature subsets. Here, we consider the dynamic feature selection (DFS) problem where a model sequentially queries features based on the presently available information. DFS is often addressed with reinforcement learning (RL), but we explore a simpler approach of greedily selecting features based on their conditional mutual information. This method is theoretically appealing but requires oracle access to the data distribution, so we develop a learning approach based on amortized optimization. The proposed method is shown to recover the greedy policy when trained to optimality and outperforms numerous existing feature selection methods in our experiments, thus validating it as a simple but powerful approach for this problem.
translated by 谷歌翻译
We demonstrate how efficient autonomous drone swarms can be in detecting and tracking occluded targets in densely forested areas, such as lost people during search and rescue missions. Exploration and optimization of local viewing conditions, such as occlusion density and target view obliqueness, provide much faster and much more reliable results than previous, blind sampling strategies that are based on pre-defined waypoints. An adapted real-time particle swarm optimization and a new objective function are presented that are able to deal with dynamic and highly random through-foliage conditions. Synthetic aperture sensing is our fundamental sampling principle, and drone swarms are employed to approximate the optical signals of extremely wide and adaptable airborne lenses.
translated by 谷歌翻译
Sequential testing, always-valid $p$-values, and confidence sequences promise flexible statistical inference and on-the-fly decision making. However, unlike fixed-$n$ inference based on asymptotic normality, existing sequential tests either make parametric assumptions and end up under-covering/over-rejecting when these fail or use non-parametric but conservative concentration inequalities and end up over-covering/under-rejecting. To circumvent these issues, we sidestep exact at-least-$\alpha$ coverage and focus on asymptotically exact coverage and asymptotic optimality. That is, we seek sequential tests whose probability of ever rejecting a true hypothesis asymptotically approaches $\alpha$ and whose expected time to reject a false hypothesis approaches a lower bound on all tests with asymptotic coverage at least $\alpha$, both under an appropriate asymptotic regime. We permit observations to be both non-parametric and dependent and focus on testing whether the observations form a martingale difference sequence. We propose the universal sequential probability ratio test (uSPRT), a slight modification to the normal-mixture sequential probability ratio test, where we add a burn-in period and adjust thresholds accordingly. We show that even in this very general setting, the uSPRT is asymptotically optimal under mild generic conditions. We apply the results to stabilized estimating equations to test means, treatment effects, etc. Our results also provide corresponding guarantees for the implied confidence sequences. Numerical simulations verify our guarantees and the benefits of the uSPRT over alternatives.
translated by 谷歌翻译
Large language models (LLMs) have demonstrated impressive capabilities in natural language understanding and generation, but the quality bar for medical and clinical applications is high. Today, attempts to assess models' clinical knowledge typically rely on automated evaluations on limited benchmarks. There is no standard to evaluate model predictions and reasoning across a breadth of tasks. To address this, we present MultiMedQA, a benchmark combining six existing open question answering datasets spanning professional medical exams, research, and consumer queries; and HealthSearchQA, a new free-response dataset of medical questions searched online. We propose a framework for human evaluation of model answers along multiple axes including factuality, precision, possible harm, and bias. In addition, we evaluate PaLM (a 540-billion parameter LLM) and its instruction-tuned variant, Flan-PaLM, on MultiMedQA. Using a combination of prompting strategies, Flan-PaLM achieves state-of-the-art accuracy on every MultiMedQA multiple-choice dataset (MedQA, MedMCQA, PubMedQA, MMLU clinical topics), including 67.6% accuracy on MedQA (US Medical License Exam questions), surpassing prior state-of-the-art by over 17%. However, human evaluation reveals key gaps in Flan-PaLM responses. To resolve this we introduce instruction prompt tuning, a parameter-efficient approach for aligning LLMs to new domains using a few exemplars. The resulting model, Med-PaLM, performs encouragingly, but remains inferior to clinicians. We show that comprehension, recall of knowledge, and medical reasoning improve with model scale and instruction prompt tuning, suggesting the potential utility of LLMs in medicine. Our human evaluations reveal important limitations of today's models, reinforcing the importance of both evaluation frameworks and method development in creating safe, helpful LLM models for clinical applications.
translated by 谷歌翻译
Transformers have been essential to pretraining success in NLP. Other architectures have been used, but require attention layers to match benchmark accuracy. This work explores pretraining without attention. We test recently developed routing layers based on state-space models (SSM) and model architectures based on multiplicative gating. Used together these modeling choices have a large impact on pretraining accuracy. Empirically the proposed Bidirectional Gated SSM (BiGS) replicates BERT pretraining results without attention and can be extended to long-form pretraining of 4096 tokens without approximation.
translated by 谷歌翻译
In this paper, we present strong baselines for the task of Feedback Comment Generation for Writing Learning. Given a sentence and an error span, the task is to generate a feedback comment explaining the error. Sentences and feedback comments are both in English. We experiment with LLMs and also create multiple pseudo datasets for the task, investigating how it affects the performance of our system. We present our results for the task along with extensive analysis of the generated comments with the aim of aiding future studies in feedback comment generation for English language learners.
translated by 谷歌翻译
In order for automated mobile vehicles to navigate in the real world with minimal collision risks, it is necessary for their planning algorithms to consider uncertainties from measurements and environmental disturbances. In this paper, we consider analytical solutions for a conservative approximation of the mutual probability of collision between two robotic vehicles in the presence of such uncertainties. Therein, we present two methods, which we call unitary scaling and principal axes rotation, for decoupling the bivariate integral required for efficient approximation of the probability of collision between two vehicles including orientation effects. We compare the conservatism of these methods analytically and numerically. By closing a control loop through a model predictive guidance scheme, we observe through Monte-Carlo simulations that directly implementing collision avoidance constraints from the conservative approximations remains infeasible for real-time planning. We then propose and implement a convexification approach based on the tightened collision constraints that significantly improves the computational efficiency and robustness of the predictive guidance scheme.
translated by 谷歌翻译
Static subword tokenization algorithms have been an essential component of recent works on language modeling. However, their static nature results in important flaws that degrade the models' downstream performance and robustness. In this work, we propose MANTa, a Module for Adaptive Neural TokenizAtion. MANTa is a differentiable tokenizer trained end-to-end with the language model. The resulting system offers a trade-off between the expressiveness of byte-level models and the speed of models trained using subword tokenization. In addition, our tokenizer is highly explainable since it produces an explicit segmentation of sequences into blocks. We evaluate our pre-trained model on several English datasets from different domains as well as on synthetic noise. We find that MANTa improves robustness to character perturbations and out-of-domain data. We then show that MANTa performs comparably to other models on the general-domain GLUE benchmark. Finally, we show that it is considerably faster than strictly byte-level models.
translated by 谷歌翻译